skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Shiao, William"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available April 29, 2026
  2. Free, publicly-accessible full text available March 24, 2026
  3. Free, publicly-accessible full text available November 1, 2025
  4. Free, publicly-accessible full text available December 10, 2025
  5. Free, publicly-accessible full text available December 10, 2025
  6. Free, publicly-accessible full text available December 7, 2025
  7. Abstract Tensor decompositions have proven to be effective in analyzing the structure of multidimensional data. However, most of these methods require a key parameter: the number of desired components. In the case of the CANDECOMP/PARAFAC decomposition (CPD), the ideal value for the number of components is known as the canonical rank and greatly affects the quality of the decomposition results. Existing methods use heuristics or Bayesian methods to estimate this value by repeatedly calculating the CPD, making them extremely computationally expensive. In this work, we proposeFRAPPE, the first method to estimate the canonical rank of a tensor without having to compute the CPD. This method is the result of two key ideas. First, it is much cheaper to generate synthetic data with known rank compared to computing the CPD. Second, we can greatly improve the generalization ability and speed of our model by generating synthetic data that matches a given input tensor in terms of size and sparsity. We can then train a specialized single-use regression model on a synthetic set of tensors engineered to match a given input tensor and use that to estimate the canonical rank of the tensor—all without computing the expensive CPD.FRAPPEis over$$24\times $$ 24 × faster than the best-performing baseline, and exhibits a$$10\%$$ 10 % improvement in MAPE on a synthetic dataset. It also performs as well as or better than the baselines on real-world datasets. 
    more » « less
  8. Free, publicly-accessible full text available November 22, 2025
  9. Recent advances in deep learning have demonstrated the ability of learning-based methods to tackle very hard downstream tasks. Historically, this has been demonstrated in predictive tasks, while tasks more akin to the traditional KDD (Knowledge Discovery in Databases) pipeline have enjoyed proportionally fewer advances. Can learning-based approaches help with inherently hard problems within the KDD pipeline, such as how many patterns are in the data, what are different structures in the data, and how can we robustly extract those structures? In this vision paper, we argue for the need for synthetic data generators to empower cheaply-supervised learning-based solutions for knowledge discovery. We describe the general idea, early proof-of-concept results which speak to the viability of the paradigm, and we outline a number of exciting challenges that await, and a set of milestones for measuring success. 
    more » « less